Comparing Italian parsers on a common Treebank: the EVALITA experience

نویسندگان

  • Cristina Bosco
  • Alessandro Mazzei
  • Vincenzo Lombardo
  • Giuseppe Attardi
  • Anna Corazza
  • Alberto Lavelli
  • Leonardo Lesmo
  • Giorgio Satta
  • Maria Simi
چکیده

The Evalita ’07 Parsing Task has been the first contest among parsing systems for Italian. It is the first attempt to compare the approaches and the results of the existing parsing systems specific for this language using a common treebank annotated using both a dependency and a constituency-based format. The development data set for this parsing competition was taken from the Turin University Treebank, which is annotated both in dependency and constituency format. The evaluation metrics were those standardly applied in CoNLL and PARSEVAL. The results of the parsing results are very promising and higher than the state-of-the-art for dependency parsing of Italian. An analysis of such results is provided, which takes into account other experiences in treebank-driven parsing for Italian and for other Romance languages (in particular, the CoNLL X & 2007 shared tasks for dependency parsing). It focuses on the characteristics of data sets, i.e. type of annotation and size, parsing paradigms and approaches applied also to languages other than Italian.

منابع مشابه

Evalita’09 Parsing Task: constituency parsers and the Penn format for Italian

The aim of Evalita Parsing Task is at defining and extending the state of the art for parsing Italian by encouraging the application of existing models and approaches. Therefore, as in the first edition, the Task includes two tracks, i.e. dependency and constituency. This second track is based on a development set in a format, which is an adaptation for Italian of the Penn Treebank format, and ...

متن کامل

Comparing State-of-the-art Dependency Parsers on the Italian Stanford Dependency Treebank

English. In the last decade, many accurate dependency parsers have been made publicly available. It can be difficult for non-experts to select a good off-the-shelf parser among those available. This is even more true when working on languages different from English, because parsers have been tested mainly on English treebanks. Our analysis is focused on Italian and relies on the Italian Stanfor...

متن کامل

Tree Kernels-based Discriminative Reranker for Italian Constituency Parsers

English. This paper aims at filling the gap between the accuracy of Italian and English constituency parsing: firstly, we adapt the Bllip parser, i.e., the most accurate constituency parser for English, also known as Charniak parser, for Italian and trained it on the Turin University Treebank (TUT). Secondly, we design a parse reranker based on Support Vector Machines using tree kernels, where ...

متن کامل

VIT – Venice Italian Treebank: Syntactic and Quantitative Features

In this paper we will describe VIT (Venice Italian Treebank), created at the University of Venice. We will focus on the syntactic-semantic features and on the quantitative analysis of the data of our treebank comparing them to other treebanks. In general, we will try to substantiate the claim that treebanking grammars or parsers is dramatically dependent on the chosen treebank; and eventually t...

متن کامل

LICO: A Lexicon of Italian Connectives

English. This paper presents the first release of LICO, a Lexicon for Italian COnnectives. LICO includes about 170 discourse connectives used in Italian, together with their orthographical variants, part of speech(es), semantic relation(s) (according to the Penn Discourse Treebank relation catalogue), and a number of usage examples. Italiano. Questo contributo presenta la prima versione di LICO...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

متن کامل
عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008